Skip to content

Conversation

@Tolriq
Copy link
Member

@Tolriq Tolriq commented Jul 15, 2025

New extension to have proper transcoding solution in OpenAPI.

This is a WIP to start the many discussions that this will bring.

@netlify
Copy link

netlify bot commented Jul 15, 2025

Deploy Preview for opensubsonic ready!

Name Link
🔨 Latest commit 7c33949
🔍 Latest deploy log https://app.netlify.com/projects/opensubsonic/deploys/692caad1f713810008a01876
😎 Deploy Preview https://deploy-preview-168--opensubsonic.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Tolriq Tolriq marked this pull request as draft July 15, 2025 09:03
Copy link
Contributor

@kgarner7 kgarner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the markdown. General thoughts:

  1. It's extremely important to describe the behavior of what happens when you specify multiple transcoding profiles. Which one will the server return (it can only return one).
  2. I'm a bit worried that the limitations would be a bit too heavy for a server. Not entirely sure.
  3. Some parts (codec, container) could probably be made more explicit on the format

All things said, I do think this is a good first pass. I'm worried that it's a bit heavy for the server (especially if a large number of profiles/codecs/limitations is provided), but it should be good especially for mobile clients.

@Tolriq
Copy link
Member Author

Tolriq commented Jul 20, 2025

It's extremely important to describe the behavior of what happens when you specify multiple transcoding profiles. Which one will the server return (it can only return one).

The transcoding profiles are in order of preferences, the server returns the first it can, the details are present transcode decision answer.

I'm a bit worried that the limitations would be a bit too heavy for a server. Not entirely sure.

Limitations are the core of the decision process, plex, emby, jellyfin all works with the same concepts, it's relatively easy to manage server side.

Some parts (codec, container) could probably be made more explicit on the format

There was some discussion some times ago about returning details about the media and we did not reach a consensus.

I agree a consensus would be better as it would be used for the details returns for the tracks too, let's hope people can accept something.

@lachlan-00
Copy link
Member

This seems really complicated.

Is this more helpful for video servers like jellyfin where you can select various stream outputs from a list of available options?

@Tolriq
Copy link
Member Author

Tolriq commented Jul 23, 2025

This is not complicated ;)
This is necessary to have proper transcoding support for proper audio quality.

There's a lot of details in the discussion part.

Clients should have control on what quality they want. If I can't play some format like DSD, I do not want to receive low quality MP3, I want hi res FLAC.

When I cast to a Sonos device that does not support my FLAC 24/96 I want to receive FLAC 24/48 and now low quality mp3 or opus.

Same when casting to chromecast and a million other cases.

All major media providers have such an API and this is the main missing part of Subsonic for audiophiles and casting.

@lachlan-00
Copy link
Member

that explains why i'm not following it well, i listen to whatever source i'm given. but that makes sense now

@epoupon epoupon mentioned this pull request Aug 15, 2025
@Tolriq
Copy link
Member Author

Tolriq commented Sep 5, 2025

@opensubsonic/servers So holidays are now mostly over :) Any objections or remarks on the draft would be nice to be done before the final polish and having to drop / rewrite everything.

@sentriz
Copy link
Member

sentriz commented Sep 10, 2025

+1 on the complicated topic. I don't understand why we couldn't get by with extending the current stream.view?format= param

this param has been around for years, but it's been ambiguous for servers outside the original subsonic server

but I see it as just a key for the format the client is requesting, if we had something like

getFormats.view which returned say

{
"format_1": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
"format_2": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
"format_3": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
}

these represent the possible formats a server can transcode to

the client can choose to use a format or not on its own without the server's knowledge

the song's original bitrate/sampleRate/etc is already known from the Child response:

image

so the client sees there is a transcode option resulting in a bitrate < the songs bitrate, and in a codec it can play. it can choose a format. format_1 for example

then request it getStream.view?format=format_1

this is also backwards compatible with the original subsonic server and only a small extension


so i wonder, which problem does this solution fail to address?

@sentriz
Copy link
Member

sentriz commented Sep 10, 2025

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

@lachlan-00
Copy link
Member

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

I've always treated these as the default output when running stream/download without additional options. User selectable format wouldn't affect the defaults outputs in that case.

@Tolriq
Copy link
Member Author

Tolriq commented Sep 11, 2025

I've already given a dozen examples about the need and why and all the other media providers providing such API because it is necessary to address a lot of cases.

Transcoding is not about a couple of pre defined server list, it's about having control of the result for the best result for the user.

Again if I want to cast my hires FLAC 24/96 to my Sonos device I want hi res FLAC 24/48 to have the best sound. I do not want to have to choose between mp3 and opus because that's the only 2 default values the server have.

I also do not want my DSD files transcoded to mp3 or to have to force a bitrate I want a format that I support.

xHE AAC, ... and so many different needs depending on the player and the cast target. When I cast to the phone I want 2 channels, when I cast to my hi end AVR I want to keep the 6 channels.

Each device, UPnP renderer, Chromecast, ... will have a unique list of supported combinations of parameters, this can't be handled with 3 pre defined profiles on the server.

And the details from Child are not precise enough mime and suffix are more about container than actual detailed codec informations.

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

This does not change anything on the fact they are random values as servers already supported multiple profiles ;) Most servers report them as the default transcoded result if the user does not request a transcode but the server force it.

Something that is also an issue currently, if a server decide to transcode on it's own due to it's internal settings, users are not really aware and we can't properly use the seek extension to properly seek in those transcodes.

TL;DR; The current solution is ultra limited and while it may fit some basic needs, it's not a proper solution for a mature streaming solution that OpenSubsonic needs to compete with the rest of the eco system.

@sentriz
Copy link
Member

sentriz commented Sep 11, 2025

I've already given a dozen examples about the need and why and all the other media providers providing such API because it is necessary to address a lot of cases.

Transcoding is not about a couple of pre defined server list, it's about having control of the result for the best result for the user.

Again if I want to cast my hires FLAC 24/96 to my Sonos device I want hi res FLAC 24/48 to have the best sound. I do not want to have to choose between mp3 and opus because that's the only 2 default values the server have.

I also do not want my DSD files transcoded to mp3 or to have to force a bitrate I want a format that I support.

I'm not talking about 3 formats. There could be 10s or 100s of them. All the possible codecs, sample rates, channels, bitdepths. The server has the control here to
not show combinations of parameters which aren't possible or don't make sense.

So if you want FLAC 24/48, you choose that option. If that would be upsampling, you don't choose it

for example an incomplete list of formats:

{ "name": "flac_24_48k", "codec": "flac", "bitDepth": 24, "sampleRate", 48000},
{ "name": "flac_16_44k", "codec": "flac", "bitDepth": 14, "sampleRate", 44100},
{ "name": "opus_192",    "codec": "ogg", "bitRate", 192 } // lossy, no bitrate or sample rate
{ "name": "opus_128",    "codec": "pgg", "bitRate", 192 }, // lossy, no bitrate or sample rate

Note how we don't show sampleRates and bitDepths for lossy formats. That's something the server needs to control

xHE AAC, ... and so many different needs depending on the player and the cast target. When I cast to the phone I want 2 channels, when I cast to my hi end AVR I want to keep the 6 channels.

Each device, UPnP renderer, Chromecast, ... will have a unique list of supported combinations of parameters, this can't be handled with 3 pre defined profiles on the server.

This can still be supported, with the above stuff

This proposal has the benefit of actually being feasible to implement, for servers.

And the details from Child are not precise enough mime and suffix are more about container than actual detailed codec information.

Then we can enhance this information, if it's not enough. And in a backwards compatible way. This info would be needed for this "format=" approach so that the client can correctly choose the format in wants by comparing these valuses to the getFormats values

@Tolriq
Copy link
Member Author

Tolriq commented Sep 11, 2025

All the possible codecs, sample rates, channels, bitdepths.

This is not 10 or 100, this is multiple thousands of combinations: protocol, codecs, subCodec, containers, bitdepth, samplerate, channels. Without even talking about bitrate.

This proposal has the benefit of actually being feasible to implement, for servers.

So if this proposal that is actually implemented by Plex, Emby and Jellyfin is not possible implement how did they did it?
This proposal is actually not hard to implement and if you don't think that you can do it then do not implement the extension.

I'm sorry, but listing thousands of combinations makes absolutely no sense. Either we implement a proper transcoding engine or we don't. But what you propose is not a solution to the need of the users and the clients.

If the server is able to automatically generate the list of the thousands of combinations then it can easily implement this feature as proposed in a proper way. If it's not capable and you need to manually enter them, then this server will not be able to fit the users need either.

@gravelld
Copy link
Member

getTranscodeDecision is a little clumsy for a term. How about *[T|t]ranscodeDecision -> *[T|t]ranscodeStreams? i.e. getTranscodeStreams returns a transcodeStream object

In the case of Plex et al, is it implemented this way because there's an implied control over the client? i.e. is knowledge about the client embedded in the server side code that creates the decision? One example: if there are multiple competing equivalent codecs specified, say flac and alac, how is it decided which is returned? A client may want to override that decision if the alternatives are essentially equivalent to the decision strategy. If it's to do with ordering in the query, this needs documenting.

I guess I'm not clear on the sort of control that can be be exerted by the client.

@Tolriq
Copy link
Member Author

Tolriq commented Sep 15, 2025

100% of the control is done by the client it gives a list of everything is can directly play and a list of wanted transcode profile IN ORDER. If the media fit the direct play profile then the server says you can direct play else it takes the transcoding profile in order and see the first it can do and return the necessary data for it.

The terms are related to the function. The first one is asking for a decision that can contain a transcode and the second is there to actual get the transcoded content like stream.view it does not return an object.

So IMO the getTranscodeDecision is coherent with all the other get endpoints the transcodeStream can be renamed to match the current stream both makes sense (Like we have getLyrics or getCaptions to extract data from a track)

@epoupon
Copy link
Member

epoupon commented Oct 27, 2025

In order to better understand this PR, I spent a couple of hours implementing it.
At first, I found it a bit overcomplicated, but it eventually made sense to me.
So I think it would really be a nice addition to the API OS, and it is not that complicated to implement on the server side.

Here’s what I noted:

  • Mismatch between songId in getTranscodeDecision and trackID in getTranscodeStream.
  • "maxAudioChannels" could possibly be shifted from DirectPlayProfile / TranscodingProfile to ClientInfo directly (seems to have the same usage as for the max bitrate).
  • Make it clear we don’t expect multiple values in the CodecProfile structs.
  • It looks like Jellyfin may mix up container names with file extensions; "opus" would be a valid container, considered to be "ogg". Not sure we want this.
  • For maxAudioBitrate and maxTranscodingAudioBitrate, I guess no value means no limit (should be written down)? Or make it mandatory but 0 means no limit?
  • Not sure about offset in the getTranscodeDecision endpoint since we can also set it in getTranscodeStream. Is the latter an offset to apply on top of the first one? An override? I’d just remove offset from getTranscodeDecision (it’s not part of the decision anyway).
  • "transcodeReasons" is an array, but it’s not clear which reason applies to which direct play profile or codec profile.
    Looks like we also need AudioBitdepthNotSupported, which is currently missing.

@Tolriq
Copy link
Member Author

Tolriq commented Oct 28, 2025

Mismatch between songId in getTranscodeDecision and trackID in getTranscodeStream.

Yes.

"maxAudioChannels" could possibly be shifted from DirectPlayProfile / TranscodingProfile to ClientInfo directly (seems to have the same usage as for the max bitrate).

As explained it's not at the top for optimisations reasons during playback. Your audio engine on the phone can convert a 6 channels to stereo during playback so you can support 6 channels in directplayprofiles, but directly converting to 2 channels if there's transcoding lower CPU usage on the client since there will already be some transcoding it's better to have the server working than the phone.

Make it clear we don’t expect multiple values in the CodecProfile structs.

Yes

It looks like Jellyfin may mix up container names with file extensions; "opus" would be a valid container, considered to be "ogg". Not sure we want this.

The values are normally the ffmpeg values, if a clients sends invalid or unknown values they should just be ignored.

For maxAudioBitrate and maxTranscodingAudioBitrate, I guess no value means no limit (should be written down)? Or make it mandatory but 0 means no limit?

0 means no limit as some clients will always encode fields. We can either make them mandatory or not as people prefer.

Not sure about offset in the getTranscodeDecision endpoint since we can also set it in getTranscodeStream. Is the latter an offset to apply on top of the first one? An override? I’d just remove offset from getTranscodeDecision (it’s not part of the decision anyway).

Yes it's a leftover before moving to just a transcodeParams and not a full url.

"transcodeReasons" is an array, but it’s not clear which reason applies to which direct play profile or codec profile.
Looks like we also need AudioBitdepthNotSupported, which is currently missing.

Yes there's probably some errors missing, IMO raw string is enough as in all cases this will more be for the dev than to expose nice messages to users.

Copy link
Member

@epoupon epoupon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

epoupon
epoupon previously approved these changes Nov 8, 2025
@Tolriq Tolriq marked this pull request as ready for review November 8, 2025 14:34
epoupon
epoupon previously approved these changes Nov 8, 2025
@Tolriq Tolriq requested a review from kgarner7 November 8, 2025 19:06
@Tolriq
Copy link
Member Author

Tolriq commented Nov 8, 2025

@opensubsonic/servers @opensubsonic/clients The proposal is updated. It's present in an LMS build an proven working to address this important missing part for OS.

@Tolriq
Copy link
Member Author

Tolriq commented Nov 29, 2025

And actually this also applies to the comma separated codecs and the * if we follow your logic against parsing of the strings.

If you prefer I can use comma separated values to match the rest of the proposal, else I'll ask all others to vote on the split versus changing everything.

@kgarner7
Copy link
Contributor

kgarner7 commented Nov 29, 2025

You can already have conflicting limitations by just specifying <= number and > number for the same field. If you absolutely want to eliminate the chance of conflicting limitations, then you would need to do something like the following, where each field is an optional single rule.

{
    "audioChannels": { "comparison": "LessThan", "value": 20 },
    "audioBitrate": {},
    "audioProfile": {},
    "audioSamplerate": {},
    "audioBitdepth": {}
}

Of course, if you want to support multiple separate limitations for the same value (e.g., one that is required and one that is not), then this alternate schema would prevent that.

@Tolriq
Copy link
Member Author

Tolriq commented Nov 29, 2025

When all is in the same list then we can apply a simple rule as for the rest of the direct play and transcoding profile to take the first one. Nothing complicated and fancy.

But as polymorphism this is mostly to simplify CLIENT side to not have to write complex code to write the proper data in the proper field to generate the proper JSON at the end.

At some point we need to be logical about who will use the API and how. That's the clients that needs to rebuild the profiles from their player support and even worse when dealing with UPnP and custom profiles that users will need to provide for their specific devices.

That's what matters. An API that is lisible and that people can use and that we can maintain without breaking changes.

@kgarner7
Copy link
Contributor

When all is in the same list, you still have problems of incompatible rules, because the same item can be specified multiple times regardless. At the end of the day, you are trying to encode a list for certain parameters. In JSON, there's a type for that: it's called an array.

At this point, I don't really care about the comparisons either way. I'd love for = and != to be formally expressed as an array in the schema, but /shrug/. That being said, at least for the other types that are always a list, you should make it so. And then, rather than say *, just make the list empty, since as I understand, the point of the container, audioCodec and protocol lists is to restrict the allowable fields.

@Tolriq Tolriq dismissed stale reviews from paulijar and epoupon via 156672c November 30, 2025 08:35
@Tolriq Tolriq force-pushed the transcoding branch 4 times, most recently from 282d966 to 381410e Compare November 30, 2025 10:13
epoupon
epoupon previously approved these changes Nov 30, 2025
Copy link
Member

@epoupon epoupon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all this work!

@kgarner7
Copy link
Contributor

Thanks for the changes. Please do also add

opensubsonic:
- Extension

for the endpoints for tracking

kgarner7
kgarner7 previously approved these changes Nov 30, 2025
epoupon
epoupon previously approved these changes Nov 30, 2025
@Tolriq Tolriq enabled auto-merge (squash) November 30, 2025 17:49
@Tolriq Tolriq requested a review from paulijar November 30, 2025 18:08
Copy link
Member

@paulijar paulijar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this one final typo fixed, the PR can be approved.

New extension to have proper transcoding solution in OpenAPI.
@Tolriq Tolriq dismissed stale reviews from epoupon and kgarner7 via 7c33949 November 30, 2025 20:36
@Tolriq Tolriq disabled auto-merge December 1, 2025 07:20
@Tolriq Tolriq merged commit 04d932e into opensubsonic:main Dec 1, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants